Splice site prediction using stochastic regular grammars.

نویسندگان

  • A Y Kashiwabara
  • D C G Vieira
  • A Machado-Lima
  • A M Durham
چکیده

This paper presents a novel approach to the problem of splice site prediction, by applying stochastic grammar inference. We used four grammar inference algorithms to infer 1465 grammars, and used 10-fold cross-validation to select the best grammar for each algorithm. The corresponding grammars were embedded into a classifier and used to run splice site prediction and compare the results with those of NNSPLICE, the predictor used by the Genie gene finder. We indicate possible paths to improve this performance by using Sakakibara's windowing technique to find probability thresholds that will lower false-positive predictions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PreRkTAG: Prediction of RNA Knotted Structures Using Tree Adjoining Grammars

Background: RNA molecules play many important regulatory, catalytic and structural <span style="font-variant: normal; font-style: norma...

متن کامل

Stochastic modeling of RNA pseudoknotted structures: a grammatical approach

MOTIVATION Modeling RNA pseudoknotted structures remains challenging. Methods have previously been developed to model RNA stem-loops successfully using stochastic context-free grammars (SCFG) adapted from computational linguistics; however, the additional complexity of pseudoknots has made modeling them more difficult. Formally a context-sensitive grammar is required, which would impose a large...

متن کامل

Feature subset selection for splice site prediction

MOTIVATION The large amount of available annotated Arabidopsis thaliana sequences allows the induction of splice site prediction models with supervised learning algorithms (see Haussler (1998) for a review and references). These algorithms need information sources or features from which the models can be computed. For splice site prediction, the features we consider in this study are the presen...

متن کامل

Accurate Computation of the Relative Entropy Between Stochastic Regular Grammars

Works dealing with grammatical inference of stochastic grammars often evaluate the relative entropy between the model and the true grammar by means of large test sets generated with the true distribution. In this paper, an iterative procedure to compute the relative entropy between two stochastic deterministic regular grammars is proposed. Resumé Les travails sur l’inférence de grammaires stoch...

متن کامل

Stochastic Context-Free Grammars and RNA Secondary Structure Prediction

This thesis focus on the prediction of RNA secondary structure using stochastic context-free grammars (SCFG). The RNA secondary structure prediction problem consists of predicting a 2-dimensional structure from a 1-dimensional nucleotide sequence. The theory behind SCFG is explained and an overview of the research literature on various methods in the field of secondary structure prediction is g...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Genetics and molecular research : GMR

دوره 6 1  شماره 

صفحات  -

تاریخ انتشار 2007